Goto

Collaborating Authors

 serving ml model


Serving ML Models with TorchServe

#artificialintelligence

This post will walk you through a process of serving your deep learning Torch model with the TorchServe framework. There are quite a bit of articles about this topic. However, typically they are focused either on deploying TorchServe itself or on writing custom handlers and getting the end results. That was a motivation for me to write this post. It covers both parts and gives end-to-end example.


Serving ML Models as Reusable Containers

#artificialintelligence

We want to create a class which contains methods to download a model from S3 and load it directly into memory. Something as simple as this should suffice for demonstration purposes. Once you've built the image you can run your container using the following command: Change the model url to your S3 URL and the docker image to point to your docker registry. I've put mine up there just as an example. Now here comes the best part.


Serving ML Models in Production: Common Patterns - KDnuggets

#artificialintelligence

This post is based on Simon Mo's "Patterns of Machine Learning in Production" talk from Ray Summit 2021. Over the past couple years, we've listened to ML practitioners across many different industries to learn and improve the tooling around ML production use cases. Through this, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready. It is a scalable and programmable serving framework built on top of Ray to help you scale your microservices and ML models in production.